Memory Matters: Convolutional Recurrent Neural Network for Scene Text Recognition
نویسندگان
چکیده
Text recognition in natural scene is a challenging problem due to the many factors affecting text appearance. In this paper, we presents a method that directly transcribes scene text images to text without needing of sophisticated character segmentation. We leverage recent advances of deep neural networks to model the appearance of scene text images with temporal dynamics. Specifically, we integrates convolutional neural network (CNN) and recurrent neural network (RNN) which is motivated by observing the complementary modeling capabilities of the two models. The main contribution of this work is investigating how temporal memory helps in an segmentation free fashion for this specific problem. By using long short-term memory (LSTM) blocks as hidden units, our model can retain long-term memory compared with HMMs which only maintain short-term state dependences. We conduct experiments on Street View House Number dataset containing highly variable number images. The results demonstrate the superiority of the proposed method over traditional HMM based methods.
منابع مشابه
Reading Scene Text with Attention Convolutional Sequence Modeling
Reading text in the wild is a challenging task in the field of computer vision. Existing approaches mainly adopted Connectionist Temporal Classification (CTC) or Attention models based on Recurrent Neural Network (RNN), which is computationally expensive and hard to train. In this paper, we present an end-to-end Attention Convolutional Network for scene text recognition. Firstly, instead of RNN...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملScene Text Recognition with Sliding Convolutional Character Models
Scene text recognition has attracted great interests from the computer vision and pattern recognition community in recent years. State-of-the-art methods use concolutional neural networks (CNNs), recurrent neural networks with long short-term memory (RNN-LSTM) or the combination of them. In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cogni...
متن کاملSmart Library: Identifying Books in a Library using Richly Supervised Deep Scene Text Reading
Physical library collections are valuable and long standing resources for knowledge and learning. However, managing books in a large bookshelf and finding books on it often leads to tedious manual work, especially for large book collections where books might be missing or misplaced. Recently, deep neural models, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) hav...
متن کاملSqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network
A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoderdecoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs characterlevel sequential correction and classification based on learn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1601.01100 شماره
صفحات -
تاریخ انتشار 2016